Efficient classical simulation of the approximate quantum Fourier transform 
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We present a method for classically simulating quantum circuits based on the tensor contraction 
model of Markov and Shi (quant-ph/0511069 ). Using this method we are able to classically simulate 
the approximate quantum Fourier transform in polynomial time. Moreover, our approach allows us 
to formulate a condition for the composability of simulable quantum circuits. We use this condition 
to show that any circuit composed of a constant number of approximate quantum Fourier transform 
circuits and log depth circuits with limited interaction range can also be efficiently simulated. 
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One of the most useful ways of investigating the power, 
and limitations, of quantum computation is to identify 
classes of quantum algorithms which can be efficiently 
simulated on a classical computer. A well-known exam- 
ple of such a class are circuits composed of Clifford group 
operations, which are shown by the Gottesman-Knill the- 
orem [l| to be efficiently simulable. Recently a number of 
new methods for simulating quantum computation have 
appeared both for the circuit model and the 

measurement-based model of quantum computation. Un- 
like the Gottesman-Knill theorem which is based on re- 
stricting the set of allowed gates, all these new methods 
rely on the topology (the 'graph' of connections) of the 
simulated circuit. In particular two of these new meth- 
ods, due to Jozsa [2| and Markov and Shi both use 
the formalism of tensor contraction for simulations in the 
quantum circuit model, which is the focus of this paper. 

We base our approach on Markov and Shi's formalism, 
which has the advantage of being able to simulate gener- 
alised quantum dynamics and mixed states (this would 
be particularly useful in simulating noisy gates), as well 
as working directly with the natural graph of the circuit. 
Using this approach, we show how the approximate quan- 
tum Fourier transform (AQFT) can be efficiently simu- 
lated on a classical computer (i.e. simulated in a time 
polynomial in the number of input qubits). Additionally, 
our method allows us to formulate a simple condition for 
the composability of two simulable circuits. That is, if 
the simulation procedures for two circuits obey a partic- 
ular condition, we are assured that the composed circuit 
(created by connecting the outputs of one to the inputs 
of the other) will also be efficiently simulable. We use 
this condition to show that any circuit composed of con- 
stant number of AQFT circuits and log depth quantum 
circuits with bounded interaction range can be efficiently 
simulated on a classical computer. Obviously, this im- 
plies that the AQFT can be efficiently simulated when 
applied to any state produced by such circuits. 

Simulating Quantum Computation In order to 
simulate a quantum computation, we first associate a 
graph with the circuit in the obvious way, representing 
each input qubit, gate, and output qubit by a vertex, 
and each wire by an edge (e.g. a two-qubit gate would 
correspond to a vertex of degree four). Next, we label 



each edge with a different index (i,j,k, etc.). Each index 
ranges over four possible values, corresponding to the 
four components of a qubit's density operator. Finally, 
to each vertex we associate a tensor describing the op- 
eration performed at that point. This tensor has indices 
corresponding to all edges connected to that vertex (so 
that its rank is equal to the degree of the vertex). For 
clarity, we use raised indices to denote output wires, and 
lowered indices to denote input wires. 

Following Markov and Shi's approach fsj, we associate 
tensors with basic circuit elements as follows, using the 
operator basis a = {|0)(0|, |0)(1|, |1)(0|, |I)(I|} for single 
qubits, and = ® Cj for two qubits: 



1. Inputting a qubit in state p: 



T = tr(elp) 



2. Performing a single-qubit operation p — > G[p]: 
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I|=tr(elG[e,]) 
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3. Performing a two-qubit operation p G'[p\ 
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7;'!'=tr(eT.G'[eM]) (3) 
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4. Obtaining a measurement result corresponding to 
a generalised measurement (POVM) operator E: 



r,-=tr(i?e,) 



(4) 



5. Discarding a qubit, or obtaining an unspecified 
measurement result: 



-<3 T, = tr(e,) 



(5) 



Note that these examples can easily be extended to 
apply to joint input states or measurements, gates acting 
on more qubits, or gates with different numbers of inputs 
and outputs. Tensors could even be introduced to rep- 
resent non-physical (i.e. not completely positive) linear 
operations if desired. 
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FIG. 1: The graph and tensors associated with a simple ex- 
ample circuit described below, in which a two-qubit unitary 
gate acts on the state pi® p2, and the first qubit is measured 
to be in state |0). 



Once a tensor with the appropriate indices has been as- 
signed to each vertex, taking their product and summing 
over all indices will yield the probability of obtaining the 
specified measurement result. This process is illustrated 
below for the simple circuit shown in fig. [TJ in which a 
two qubit unitary gate G[p\ = UpU^ acts on the separa- 
ble input state pi ® p2, and the probability p of the first 
qubit being found in the state |0) is obtained. 

p = Y,T'T^T^}nTi (6) 

i'jkl 

= ^tr(eJpi)tr(e]p2)tr(et^C/e,,(7t)tr(|o)(0|efe)tr(eO 

ijkl 

= ^tr(e[.;C/(pi®p2)C/^)tr((|0)(0|®/)efcO 

M 

= tr((|0)(0| ®/)C/(pi «)p2)C/t) 

Note that in cases where we wish to measure many out- 
put qubits, it may be prohibitive to calculate the proba- 
bility of all possible output strings (as there are exponen- 
tially many possibilities). Instead, a closer analogy to the 
real quantum computation would be to sample from the 
probability distribution to obtain a particular random 
measurement result. This can be achieved by computing 
the probability of measurement outcomes for the first 
qubit (with the measurement results for all other qubits 
unspecified) and randomly selecting a result, then com- 
puting the joint probability of measurement results for 
the second qubit with the chosen result on the first qubit, 
and randomly selecting one, and so on until a particular 
measurement result has been selected for each qubit. 

The problem with summing over all tensor indices at 
the same time (as written in equation (O) is that there 
are exponentially many terms, making the computation 
very slow. To avoid this, we 'contract' the tensors to- 
gether one at a time, breaking the joint sum into a 
series of separate sums. In each step of the compu- 
tation we replace two existing tensors with a new ten- 
sor obtained by summing over any repeated indices (e.g. 
TlfTiZ T^r^). We repeat this procedure until we are 
left with a single tensor with no free indices, which is the 
desired probability. The aim is to order the contractions 
so that we never generate tensors with too many indices 
during this process. 



In Markov and Shi's paper, they describe this contrac- 
tion process by an ordering on the edges of the graph (i.e. 
on index summations). However here we take a different 
approach, in which the contraction process is described 
by a sequence of sets of vertices S = (s^, . . . , s^) - each 
of which corresponds to a particular tensor that is gener- 
ated during the computation. This allows us to formulate 
a condition for efficient simulation of composite circuits. 

The tensor corresponding to a set of vertices s is that 
generated by contracting together all initial tensors corre- 
sponding to vertices in s. In each step of the contraction 
process we take two existing tensors and generate a new 
one, so each set G 5 is either the union of two previ- 
ous sets, or one previous set and a vertex, or two vertices. 
Denoting the set of all vertices by V: 

= {t\ U f,} where f} = ' ^ < \' (7) 

^ ^ or ^ {v} , V eV. ^ ' 

The calculation of the probability is done in N steps, 
where in step i we compute a new tensor by summing 
over all indices corresponding to edges connecting t\ to 
tj. For the computation to be complete, we require that 
the final set = V. Note that sampling from the out- 
put probability distribution for many qubits as described 
above only requires changing the measurement operators 
applied to the outputs, and hence each run can use the 
same graph and contraction sequence S. 

The computational difficulty of the simulation is de- 
termined by the maximal rank of the tensors generated 
during the computation. For each in S we therefore 
define i?* as the number of edges that connect vertices 
in s* to vertices outside s\ which is exactly the rank 
of the tensor corresponding to s'. The simulation cor- 
responding to the sequence S will be an efficient one if 
j^max _ jnaxi E'* — O(logn). This condition assures us 
that the maximal number of components for each ten- 
sor we compute is 0(poly(n)). Furthermore, the maxi- 
mal number of terms summed over when computing each 
component must also be 0(poly(n)), as the two sets t\ 
and t\ which are combined to form can only connect 
on at most E"^°-^ — O(logn) edges. 

A class of circuits that is easy to simulate efficiently 
using this approach is that of log-depth circuits with 
bounded interaction range (i.e. involving d = O(logn) 
timesteps, in which gates act on qubits at most a con- 
stant distance r apart). To simulate such circuits, we 
number all vertices involving qubit 1 first (i.e. gates, 
inputs and outputs on the upper horizontal line of the 
circuit), then all vertices involving qubit 2 that are not 
already included, and so on until we have numbered all 
the vertices. The sequence S is then composed of sets 
containing increasing numbers of vertices in this order- 
ing (e.g. {{v^},{v^,v^}, . . .). It is easy to see that for 
such simulations, E''^'^^ < dr = O(logn). The efficient 
simulability of such circuits has previously been shown 
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FIG. 3: The gates composing a ladder circuit, consisting of 
a Hadamard gate and then m — 1 controlled rotation gates. 
Note that the last few ladder circuits are actually slightly 
smaller, although they have the same form. 



FIG. 2: The general structure of the circuit calculating the 
approximate quantum Fourier transform, showing a measure- 
ment of |0) on the first output qubit. Each box corresponds 
to a ladder circuit, with internal gates as shown in figlD 



in 0, Furthermore, it was shown by Jozsa 0, using 
a slightly different approach 10], that the same strat- 
egy will work for any circuit in which each qubit line is 
touched (or crossed) by at most O(logn) gates. 

Simulating the Approximate Quantum Fourier 
Transform. An important new example of a circuit 
which can be efficiently simulated by the above scheme 
is the approximate quantum Fourier transform (AQFT). 
An efficient circuit for the exact Fourier transform [6[ 
consists of a sequence of (n — 1) 'ladder circuits' of de- 
creasing size. The l^^ ladder circuit is composed of a 
Hadamard gate on the l*'^ qubit, followed by (n — l) condi- 
tional phase gates connecting qubit I to qubits I + 1, . . .n 
respectively. These conditional phase gates have the form 
Rk — ea;p(7ri/2''')|ll)(ll|, where k is the distance over 
which the gate acts. However, it was noted by Copper- 
smith Q that in many case an exact Fourier transform is 
not necessary, and that a very good approximation can be 
obtained by omitting all gates Rk with k > m (i.e. gates 
that act over a large distance, and generate only small 
phase rotations). In what follows, we take m = log(n/e), 
yielding an error in the final state of 0(e). Furthermore 
this approximate quantum fourier transform is sufficient 
for the most useful application of the algorithm - for esti- 
mating periodicity, and hence for use in Shor's factoring 
algorithm ^] . Barenco et al. [§] proved that the AQFT 
will yield the same probability of success as the exact 
periodicity-finding algorithm after 0{n^ /m^) runs. A di- 
agram of the AQFT circuit is given in fig. O Note that 
in this circuit, the output qubits occur in reverse order 
to the inputs (i.e. starting at the bottom). 

In order to classically simulate the AQFT circuit we 
cannot use the same simple ordering S used for the log- 
depth circuits above, as 0((logn)^) gates cross each qubit 
line. This leads to tensors with n'~'^^°^ elements, that 



cannot be computed in polynomial time. Instead, we 
choose the following contraction ordering: We first con- 
tract together all of the tensors corresponding to gates 
in the first ladder circuit (in any order), then we pro- 
ceed to do the same for the second ladder circuit, and 
so on, until we have one combined tensor for each ladder 
circuit (i.e. until S contains sets corresponding to the 
vertices in each ladder circuit). Since there are at most 
TO = O(logn) two-qubit gates in each ladder circuit, all 
the tensors we generate have at most O(logn) indices. 

Next we combine the ladder circuits and their associ- 
ated inputs and outputs, one by one in descending order, 
until we have contracted together all the remaining ten- 
sors. First, we take the tensor for the top ladder circuit 
and contract it with all the tensors of input and output 
vertices to which it is connected, in order from top to 
bottom (i.e. i\ to and i^) . Then we contract this 
new tensor with the tensor for the second ladder circuit, 
and again contract it with any input or output vertices 
to which it is connected (i.e. i'^^^ and i^)- We continue 
to contract each new tensor with that of the next lad- 
der circuit, and its inputs and outputs vertices, until all 
tensors have been included - and thus we have computed 
the probability of the chosen measurement result. 

Note that in each stage of the contraction process, 
the new tensors we generate only have at most E™'"'^ — 
0(log n) free indices - hence storing and computing these 
tensors requires only polynomial time and memory space. 
We have therefore proved that the approximate quantum 
fourier transform can be efficiently simulated classically. 

So far we have assumed that the input to the AQFT 
circuit is a product state of n qubits (although this need 
not be a computational basis state) . A way to generalize 
this set of possible input states is to identify classes of 
efficiently simulable circuits that can be connected to the 
inputs of the AQFT circuit such that the composed cir- 
cuit is also efficiently simulable. In the next section, we 
therefore consider the simulability of composed circuits. 

Simulating composed circuits. Consider two effi- 
ciently simulable circuits, A and B, that we join together 
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to form a composed circuit C by connecting the output 
wires of A to the input wires of B (for simplicity we as- 
sume that both are n qubit circuits, our discussion can 
be easily generalized to the case where only some of the 
inputs and output connect). How can we tell if the com- 
posed circuit C is efficiently simulable? 

In what follows we will use the subscripts a and b to 
label objects belonging to circuits A and B respectively. 
Let us also denote the set of output vertices of A by 
yout g^^^ ^YiQ set of input vertices of B by V™ . Given 
efficient simulations of A and B, C will be efficiently 
simulable if the following condition holds: For any subset 

which includes exactly this subset of output vertices (i.e. 
Sa n 14°"* = ^a), then there is a set Sf, G Sb containing 
exactly the same subset of input vertices (i.e. Sb n Vj,*" = 

HX- ■■<%})■ 

To prove that this condition is sufficient, we first de- 
compose all the sets in Sa containing output vertices 
into their output and non-output components, writing 
*a = f^a J whcrc G V°'"'* dcuotcs a particular set 
of output vertices, and fi^^ denotes a corresponding set 
of non-output vertices. Similarly, we decompose each set 
Sb G Sb containing input vertices in the same way into 
a set of input vertices rjl and their associated non-input 
vertices /z^-' , such that s^"' = rjlU . 

From our simulation procedure it is clear that any 
two sets in a sequence are either disjoint or are such 
that one includes the other. It is also clear that the 
order in which two disjoint sets are constructed is ar- 
bitrary (we can choose which set to construct first). 
Therefore, by re-labelling and re-ordering the sets in Sa, 
we can ensure that all sets not containing output ver- 
tices occur first, and that the remaining sets s^^ occur 
in the order of increasing i and then increasing j (e.g. 
s]^,s]^, s]^, s^^, . . .), and similarly for Sb and the input 
vertices. Furthermore, our composability condition en- 
sures that we can find sequences of this form, such that 
the output vertices in connect precisely with the input 
vertices in t]\. 

We construct a sequence of sets Sc for the combined 
circuit C as follows: Starting with circuit A, we first 
include all sets from Sa that do not involve input vertices, 
then do the same for Sb- The next set we include is ij}a U 
/z^^, in which the first output from A is contracted with 
an input of J5 (yielding the union of two non-output sets). 
We proceed to evolve this set in A by including /ij^ U /x^^ 
for J = 2, . . ■j™'^^. After this, we shift to evolving circuit 
B, by including sets /ia"'^ U for k — l,...k™'^^. 
Then, beginning with /i^^ U ii^-^, we repeat the above 
procedure for i = 2, . . . , i™"^^ by including fi'^J U fil^ for 
j = 1, . . . jmax^ ^Yien fi^^"'' U/if for fc = 1, . . . jf''^^, until 
all vertices in the combined circuit have been included. 

The key point is that at any stage in the above process 
the sets we construct are either identical to a set in the 



original sequences, or composed of a union of two such 
sets with some input and output vertices discarded (i.e 
those across which the circuit is connected). It is there- 
fore clear that E'^'^'^ < (E'™'"' + E^'^'^), and hence when 
both i<;™3,x g^j^j ^max g^j,g O(logn), the simulation process 

defined by Sc is an efficient one. 

These results can be generalised to apply to any con- 
stant number of efficiently simulable circuits connected 
in series. In such cases, the combined circuit will be effi- 
ciently simulable when the above composability condition 
is satisfied across each circuit boundary. 

From the simulation procedures for the AQFT circuit 
and log-depth limited range circuits given above, we see 
that both the input and output vertices are included se- 
quentially from bottom to top (i.e. uji = . . . , 1;°"*} 
and rii = {w™, . . . ,w["}). As each output set from one 
circuit corresponds exactly to an input set for the other, 
these two circuits obey our composability condition. Fur- 
thermore, with LOi and rji defined as above, their simula- 
tion sequences do not need to be rearranged before they 
are composed. By joining these two circuits together, 
we can classically simulate the approximate quantum 
Fourier transform on any input state that can be pro- 
duced by log-depth circuit involving limited range inter- 
actions. 

Because the outputs of the AQFT circuit occur in re- 
verse order, attaching a circuit afterwards is more tricky, 
but can be achieved by flipping the attached circuit ver- 
tically. In order to satisfy the composability condition, 
tensors in the flipped circuit must still be contracted from 
top to bottom, but this can easily be achieved for both 
types of circuit considered here (since the original circuits 
are also simulable with a bottom to top contraction or- 
dering). We therefore conclude that any circuit which is 
composed of a constant number of AQFT and log-depth 
limited range circuits can be simulated efficiently on a 
classical computer. 
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Note added: After the completion of this work, we 
became aware of a very recent paper by Aharonov, Lan- 
dau and Makowski (quant-ph/0611156) which appears to 
simulate an AQFT circuit in time. 
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